Structural speaker adaptation using maximum a posteriori approach and a Gaussian distributions merging technique
نویسندگان
چکیده
The aim of speaker adaptation techniques is to enhance the speaker-independent acoustic models to bring their recognition accuracy as close as possible to the one obtained with speaker-dependent models. Recently, a technique based on hierarchical structure and the maximum a posteriori criterion was proposed (SMAP). In this paper, like in SMAP, we assume that the acoustic models parameters are organized in a tree containing all the Gaussian distributions. Each node in that tree represents a cluster of Gaussian distributions sharing a common affine transformation representing the mismatch between training and test conditions. To estimate this affine transformation, we propose a new technique based on merging Gaussians and the standard MAP adaptation. This new technique is very fast and allows a good unsupervised adaptation for both means and variances even with small amount adaptation data. This adaptation strategy has shown a significant performance improvement in a large vocabulary speech recognition task, alone and combined with the MLLR adaptation.
منابع مشابه
Structural linear model-space transformations for speaker adaptation
Within the framework of speaker-adaptation, a technique based on tree structure and the maximum a posteriori criterion was proposed (SMAP). In SMAP, the parameters estimation, at each node in the tree is based on the assumption that the mismatch between the training and adaptation data is a Gaussian PDF which parameters are estimated by using the Maximum Likelihood criterion. To avoid poor tran...
متن کاملSpeaker normalization and adaptation based on linear transformation
We propose novel speaker independent (SI) modeling and speaker adaptation based on a linear transformation. An SI model and speaker dependent (SD) models are usually generated using the same preprocessing of acoustic data. This straightforward preprocessing causes a serious problem. Probability distributions of the SI models become broad and the SI models do not give good initial estimates for ...
متن کاملImproved speaker verification through probabilistic subspace adaptation
In this paper we propose a new adaptation technique for improved text-independent speaker verification with limited amounts of training data using Gaussian mixture models (GMMs). The technique, referred to as probabilistic subspace adaptation (PSA), employs a probabilistic subspace description of how a client’s parametric representation (i.e. GMM) is allowed to vary. Our technique is compared t...
متن کاملDiscriminative Transformation for Sufficient Adaptation in Text-Independent Speaker Verification
In conventional Gaussian Mixture Model – Universal Background Model (GMM-UBM) text-independent speaker verification applications, the discriminability between speaker models and the universal background model (UBM) is crucial to system’s performance. In this paper, we present a method based on heteroscedastic linear discriminant analysis (HLDA) that can enhance the discriminability between spea...
متن کاملImproved Speaker Verification through Pro
In this paper we propose a new adaptation technique for improved text-independent speaker verification with limited amounts of training data using Gaussian mixture models (GMMs). The technique, referred to as probabilistic subspace adaptation (PSA), employs a probabilistic subspace description of how a client’s parametric representation (i.e. GMM) is allowed to vary. Our technique is compared t...
متن کامل